library(ggplot2)
library(mvtnorm)
library(dplyr)
library(cowplot)

This example is a nonconjugate model for a bioassay experiment. It is a two-parameter example from the class of generalized linear models. Here, we use a simulation grid-approximation approach to get the posterior distribution.

The data

To test how toxic a drug is, it is often given to animals at various doses and then observed how many have adverse outcomes. Often, the response is simply a dichothomous outcome: animal alive or dead. An example of such data:

d <- data.frame(log.dose=c(-0.86, -0.3, -0.05, 0.73),
                n.animals=c(5,5,5,5),
                n.deaths=c(0,1,3,5))
d %>% knitr::kable()
log.dose n.animals n.deaths
-0.86 5 0
-0.30 5 1
-0.05 5 3
0.73 5 5
d <- d %>%
  mutate(prop=n.deaths / n.animals) 
d %>%
  ggplot(aes(x=log.dose, y=prop)) +
  geom_point(col="#E41A1C") +
  theme_minimal() +
  labs(y="Proportion of deaths",
       x="Dose (log g/ml")

The model

It is reasonable to model the outcomes of the five animals within each group as exchangeable and independent. The data points \(y_i\)=n.deaths are then binomially distributed: \[y_i | \theta_i \sim \text{Bin}(n_i, \theta)\] where \(n_i\) is the number of animals (in this example it is 5 for each group).

We model the response \(\theta\) by the dose using a linear model together with a logit-link: \[\text{logit}(\theta_i) = \alpha + \beta x_i\] This is called a logistic regression model.

We will estimate this model using a grid. To know where we need to span the grid, we first use a frequentist approach to get an estimate. ## A frequentist approach To get a rough estimate around where we should plot our grid, we first compute the maximum likelihood estimate using the standard logistic regression tools.

d.notagg <- data.frame(log.dose=c(rep(-0.86, 5), rep(-0.3,5), rep(-0.05,5), rep(0.73,5) ),
                death=c(rep(0,5), 1, rep(0,4), rep(1, 3), 0, 0, rep(1, 5)) )

fit <- glm(death ~ 1 + log.dose,
          data=d.notagg,
          family="binomial")

summary(fit)

Call:
glm(formula = death ~ 1 + log.dose, family = "binomial", data = d.notagg)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.37756  -0.64102  -0.07708   0.05473   1.83495  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.8466     1.0191   0.831    0.406
log.dose      7.7488     4.8727   1.590    0.112

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.526  on 19  degrees of freedom
Residual deviance: 11.789  on 18  degrees of freedom
AIC: 15.789

Number of Fisher Scoring iterations: 7

The estimate is \((\hat{\alpha}, \hat{\beta}) = (0.85, 7.75)\) with standard errors of 1.0 and 4.9 for \(\alpha\) and \(\beta\), respectively.

Approximating the posterior using a grid

First, we define some functions.

logit <- function(x) log(x / (1-x) )
invlogit <- function(x) exp(x) / (1 + exp(x))

We compute the log posterior using the log likelihood. This helps to avoid numerical problems. Simplifying some of the expressions, we get the following function for the log likelihood:

log.lkhd <- function(alpha, beta) {
  lin <- alpha + beta * d$log.dose
  y <- d$n.deaths; n <- d$n.animals
  sum( y*(lin - log( 1 + exp(lin ) )) + ( n-y )*(-log(1 + exp(lin))) )
}

Next, we define a prior function. Since we use a uniform prior, we just define a constant function:

prior <- function(alpha, beta) {
  1
}

Next, we define our grid and a function to compute the posterior:

grid_size <- 100
alpha_seq <- seq(-5, 10, length.out=grid_size)
beta_seq <- seq(-10, 40, length.out=grid_size)

alpha_width <- alpha_seq[2] - alpha_seq[1]
beta_width <- beta_seq[2] - beta_seq[1]

post.grid <- expand.grid(alpha = alpha_seq, 
                         beta = beta_seq ) 

posterior.grid <- function(grid, prior_fun=prior) {
  grid %>%
  rowwise %>%
  mutate(loglkhd = log.lkhd(alpha, beta),
         prior = prior_fun(alpha, beta)) %>%
  mutate(log.post = loglkhd + log(prior) ) %>%
  ungroup() %>%
  mutate(log.postm = log.post - max(log.post),
         un.post = exp(log.postm),
         # normalize the posterior
         post = un.post / sum(un.post),
         prior = prior / sum(prior) ) %>%
    select(-log.postm, -un.post)
}

Now, we compute the posterior:

post.grid <- posterior.grid(post.grid)

We can plot the posterior density as contour lines. To get the right contour lines, we use the mode and multiply it with 0.05, 0.1, 0.15, …, 0.95.

mode <- max(post.grid$post)
breaks <- seq(0.05, 0.95, by=0.1) * mode

unf_post_plot <- post.grid %>%
  ggplot(aes(x=alpha, y=beta, z=post)) + 
  stat_contour(breaks=breaks, col="#377EB8") +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Posterior density", subtitle = "with uniform prior")
unf_post_plot

Sampling from the posterior

To sample from the posterior, we take the following steps: 1. Compute the marginal posterior distribution of \(\alpha\) by numerically summing over \(\beta\):

marg.alpha <- post.grid %>%
  group_by(alpha) %>%
  summarise(post = sum(post)) %>%
  pull(post)
  1. For \(s = 1, ..., 1000\) (or more if you want to have more samples),
  1. Draw samples from \(p(\alpha | y)\):
N <- 1000
alpha.sample <- sample(seq(-5, 10, length.out = 100), N, replace=T, prob=marg.alpha)
  1. Draw \(\beta\) from the discrete conditional distribution \(p(\beta | \alpha, y)\) given the just-sampled value of \(\alpha\):
beta.sample <- c()
for(i in 1:N) {
  cond.beta <- post.grid %>%
    filter(alpha == alpha.sample[i]) %>%
    mutate(post = post / sum(post)) %>%
    pull(post)
  bsamp <- sample(seq(-10, 40, length.out = 100), 1, prob=cond.beta)
  beta.sample[i] <- bsamp
}
  1. For each of the sampled \(\alpha\) and \(\beta\), add a uniform random jitter, centered at zero with a width equal to the spacing of the sampling grid, This gives the simulation draws a continuous distribution:
alpha.sample <- alpha.sample + runif(N, min = 0 - alpha_width/2,
                                        max = 0 + alpha_width/2)
beta.sample <- beta.sample + runif(N, min = 0 - beta_width/2, 
                                      max = 0 + beta_width/2 )

The whole thing as a function:

extract.sample <- function(density.grid, N=1000, prior=FALSE) {
  if (prior) {
    density.grid <- density.grid %>%
      select(alpha, beta, post=prior)
  } else {
    density.grid <- density.grid %>%
      select(alpha,  beta, post)
  }
  marg.alpha <- density.grid %>%
  group_by(alpha) %>%
  summarise(post = sum(post)) %>%
  pull(post)
  
  alpha.sample <- sample(seq(-5, 10, length.out = 100), N, replace=T, prob=marg.alpha)
  beta.sample <- c()
  for(i in 1:N) {
    cond.beta <- density.grid %>%
      filter(alpha == alpha.sample[i]) %>%
      mutate(post = post / sum(post)) %>%
      pull(post)
    bsamp <- sample(seq(-10, 40, length.out = 100), 1, prob=cond.beta)
    beta.sample[i] <- bsamp
  }
  alpha.sample <- alpha.sample + runif(N, min = 0 - alpha_width/2,
                                        max = 0 + alpha_width/2)
  beta.sample <- beta.sample + runif(N, min = 0 - beta_width/2, 
                                      max = 0 + beta_width/2 )
  
  data.frame(alpha = alpha.sample,
           beta = beta.sample)
}

We can now plot the posterior sample:

post.sample <- extract.sample(post.grid)

post.sample %>%
  ggplot(aes(x=alpha, y=beta)) +
  geom_point(size=0.5) +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Posterior sample")

We can plot the samples as the resulting logistic model together with the data:

plot_samples <- function(sample, n=100, title="Posterior samples", subtitle="") {
  log.dose <- seq(-1.5, 1.5, length.out = 100)

  sample %>%
    sample_n(size=n) %>%
    mutate(id=1:n) %>%
    purrr::pmap_df(~tibble(log.dose=log.dose, id=..3,
                          prop=invlogit(..1 + ..2*log.dose))) %>%
    ggplot(aes(x=log.dose, y=prop)) +
    geom_line(aes(group=id), alpha=0.2, col="#377EB8") +
    geom_hline(yintercept=0.5, linetype='dashed', col="grey") +
    geom_point(data=d, col="#E41A1C") +
    theme_minimal() +
    labs(x="Dose (log g/ml)", y="Proportion of deaths",
         title=title, subtitle = subtitle)
}
plot_samples(post.sample)

Note that we can also sample and visualize our prior distribution:

prior.sample <- extract.sample(post.grid, prior=TRUE)

prior.sample %>%
  ggplot(aes(x=alpha, y=beta)) +
  geom_point(size=0.5) +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Prior sample")

The prior is uniform and thus not very spectacular. For the logistic model, this then looks as follows:

plot_samples(prior.sample, n=200, title="Prior samples")

Posterior Distribution of the LD50

We can use the posterior sample to compute the LD50 - the dose level at which probability of death is 50%. In our logistic model, a 50% survival rate means \[\begin{align*} \text{LD50}: && E(\frac{y_i}{n_i}) = \text{logit}^{-1}(\alpha + \beta x_i) = 0.5 \end{align*}\] Thus \(\alpha + \beta x_i = \text{logit}(0.5) = 0\) and the LD50 is \(x_i = -\alpha / \beta\).

Attention: In this example, LD50 is a meaningless concept if \(\beta \leq 0\), in which case increasing the dose does not cause the probability of death to increase.

We report:

  1. The posterior probability that \(\beta > 0\), that is, that the drug is harmful:
mean(post.sample$beta > 0)
[1] 1

From this, we can conclude that the posterior probability of \(\beta > 0\) is roughly estimated to exceed 0.999.

  1. The posterior distribution for the LD50 conditional on \(\beta > 0\). All draws had positive values of \(\beta\), so the distribution is given by the whole sample:
LD50_samps <- post.sample %>%
  mutate( LD50 = - alpha / beta) 
LD50.mean <- LD50_samps %>%
  summarise(mean = mean(LD50))

unif.LD50.plot <- LD50_samps %>%
  ggplot(aes(x=LD50)) + 
  geom_histogram(bins=50, 
                 fill="#377EB8", col="white") +
  scale_y_continuous(labels = NULL, name="") +
  xlim(-0.6, 0.7) +
  geom_vline(data=LD50.mean, aes(xintercept=mean), col="#E41A1C") +
  theme_minimal() +
  labs(title="Posterior distribution for the LD50",
       subtitle="with uniform prior")
unif.LD50.plot

We can incorporate the LD50 data in the plot of the logistic model:

plot_samples(post.sample) +
  geom_point(data=LD50_samps[1:100,], 
             aes(x=LD50, y=0.5), alpha=0.3, size=0.5)

A different prior

We want to replace the uniform prior density by a joint normal prior distribution on \((\alpha, \beta)\) with \(\alpha \sim \text{Normal}(0, 2^2)\), \(\beta \sim \text{Normal}(10, 10^2)\), and \(\text{corr}(\alpha, \beta)=0.5\).

mvn_prior <- function(alpha, beta) {
  rho <- matrix(c(2^2, 2*10*0.5, 2*10*0.5, 10^2), ncol=2)
  dmvnorm(c(alpha, beta), 
           mean=c(0, 10),
           sigma=rho)
}

prior.density <- function(grid, prior_fun){
  grid %>%
    rowwise %>%
    mutate(prior = prior_fun(alpha, beta)) %>%
    ungroup() %>%
    mutate(prior = prior / sum(prior)) 
}

mvn.prior.grid <- prior.density(post.grid, prior_fun=mvn_prior)

Our prior density then looks as follows:

mode <- max(mvn.prior.grid$prior)
breaks <- seq(0.05, 0.95, by=0.1) * mode

mvn.prior.grid %>%
  ggplot(aes(x=alpha, y=beta, z=prior)) + 
  stat_contour(breaks=breaks, col="#E41A1C") +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Prior density")

We can visualize the samples again:

prior.sample <- extract.sample(mvn.prior.grid, prior=TRUE)

prior_points <- prior.sample %>%
  ggplot(aes(x=alpha, y=beta)) +
  geom_point(size=0.5) +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Prior sample", subtitle="with multivariate normal prior")

prior_model <- plot_samples(prior.sample, n=200, 
                            title="Prior sample", subtitle="with multivariate normal prior")
plot_grid(prior_points, prior_model)

We can see that the prior still allows a wide range of different models but different to the uniform prior, it is much more restricted to a certain range that is already very close to the observed data.

The new posterior density

We now use this prior to compute our posterior denstiy.

mvn.post.grid <- posterior.grid(post.grid, prior_fun=mvn_prior)

We can compare our new posterior density with the old posterior density (obtained with a uniform prior)

mode <- max(mvn.post.grid$post)
breaks <- seq(0.05, 0.95, by=0.1) * mode

mvn_post_plot <- mvn.post.grid %>%
  ggplot(aes(x=alpha, y=beta, z=post)) + 
  stat_contour(breaks=breaks, col="#377EB8") +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  scale_color_brewer(palette = "Set1", name="",
                     label=c("post"="posterior", "prior"), direction = -1) +
  theme_minimal() +
  labs(title="Posterior density", subtitle = "with multivariate normal prior")

plot_grid(unf_post_plot, mvn_post_plot)

We can see that our new prior is slightly more regularizing than the uniform prior: The new posterior density is a bit tighter than the old posterior.

We can also compare the maximum a posteriori estimates: With the uniform prior, we have as MAP estimate:

post.grid[which.max(post.grid$post),]
# A tibble: 1 x 6
  alpha  beta loglkhd  prior log.post    post
  <dbl> <dbl>   <dbl>  <dbl>    <dbl>   <dbl>
1 0.909  8.18   -5.90 0.0001    -5.90 0.00322

and with the multivariate normal posterior, we get:

mvn.post.grid[which.max(mvn.post.grid$post),]
# A tibble: 1 x 6
  alpha  beta loglkhd    prior log.post    post
  <dbl> <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
1 0.758  8.18   -5.92 0.000613    -10.8 0.00403

While the beta value is the same (at least in our grid approximation), the new posterior alpha is a bit closer to zero.

Sampling from the new posterior

mvn.post.sample <- extract.sample(mvn.post.grid)
mvn.prior.sample <- extract.sample(mvn.prior.grid, prior = T)

post_plot <- mvn.post.sample %>%
  ggplot(aes(x=alpha, y=beta)) +
  geom_point(size=0.5) +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Posterior sample", subtitle = "with multivariate normal prior")

prior_plot <- mvn.prior.sample %>%
  ggplot(aes(x=alpha, y=beta)) +
  geom_point(size=0.5) +
  ylim(-10, 40) +
  scale_x_continuous(breaks=c(-4, -2, 0, 2, 4, 6, 8, 10), 
                     limits = c(-5, 10)) +
  theme_minimal() +
  labs(title="Prior sample", subtitle = "with multivariate normal prior")

plot_grid(post_plot, prior_plot)

We again use the sample to visualize the posterior for the logistic model:

plot_samples(mvn.post.sample) 

We can compare this again to the prior samples which also shows how the posterior is a compromise between the data and the prior.

plot_samples(mvn.prior.sample) +
  labs(title="Prior samples")

New LD50

mvn.LD50_samps <- mvn.post.sample %>%
  mutate( LD50 = - alpha / beta) 

mvn.LD50.mean <- mvn.LD50_samps %>%
  summarise(mean = mean(LD50))

mvn.LD50.plot <- mvn.LD50_samps %>%
  ggplot(aes(x=LD50)) + 
  geom_histogram(bins=50, 
                 fill="#377EB8", col="white") +
  scale_y_continuous(labels = NULL, name="") +
  xlim(-0.6, 0.7) + 
  geom_vline(data=mvn.LD50.mean, aes(xintercept=mean), col="#E41A1C") +
  theme_minimal() +
  labs(title="Posterior distribution for the LD50", 
       subtitle="with multivariate normal prior")

plot_grid(mvn.LD50.plot, unif.LD50.plot)